AITopics | cross-entropy method

Collaborating Authors

cross-entropy method

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization

Yuanxiang Gao, Li Chen, Baochun Li

Neural Information Processing SystemsFeb-13-2026, 18:36:25 GMT

Neural Information Processing Systems http://nips.cc/

cross-entropy minimization, placement, proximal policy optimization, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Louisiana (0.04)
North America > Canada > Quebec > Montreal (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

Add feedback

Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation

Matthew O'Kelly, Aman Sinha, Hongseok Namkoong, Russ Tedrake, John C. Duchi

Neural Information Processing SystemsFeb-12-2026, 23:31:41 GMT

While recent developments in autonomous vehicle (AV) technology highlight substantial progress, we lack tools for rigorous and scalable testing.

artificial intelligence, machine learning, vehicle, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > South Korea > Seoul > Seoul (0.04)

Industry:

Transportation (0.46)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Robots (0.68)

Add feedback

Post: Device Placement with Cross-Entropy Minimization and Proximal Policy Optimization

Yuanxiang Gao, Li Chen, Baochun Li

Neural Information Processing SystemsNov-20-2025, 18:23:05 GMT

Training deep neural networks requires an exorbitant amount of computation resources, including a heterogeneous mix of GPU and CPU devices. It is critical to place operations in a neural network on these devices in an optimal way, so that the training process can complete within the shortest amount of time. The state-of-the-art uses reinforcement learning to learn placement skills by repeatedly performing Monte-Carlo experiments. However, due to its equal treatment of placement samples, we argue that there remains ample room for significant improvements.

cross-entropy minimization, placement, proximal policy optimization, (12 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Louisiana (0.04)
North America > Canada > Quebec > Montreal (0.04)
(3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Scalable End-to-End Autonomous Vehicle Testing via Rare-event Simulation

Matthew O'Kelly, Aman Sinha, Hongseok Namkoong, Russ Tedrake, John C. Duchi

Neural Information Processing SystemsNov-20-2025, 16:57:09 GMT

While recent developments in autonomous vehicle (A V) technology highlight substantial progress, we lack tools for rigorous and scalable testing.

artificial intelligence, cross-entropy method, machine learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Michigan (0.04)
(3 more...)

Genre: Research Report (0.67)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Learning to Design Soft Hands using Reward Models

Bai, Xueqian, Hansen, Nicklas, Singh, Adabhav, Tolley, Michael T., Duan, Yan, Abbeel, Pieter, Wang, Xiaolong, Yi, Sha

arXiv.org Artificial IntelligenceOct-21-2025

Amazon FAR (Frontier AI & Robotics)Figure 1: We present a Cross-Entropy Method (CEM) with reward model (CEM-RM) framework that optimizes block-wise, finger-wise, and tendon-routing design distributions of a soft robotic hand using pre-collected teleoperation data. Hardware experiments demonstrate that CEM-RM achieves effective design optimization with significantly fewer samples than pure optimization, enabling robust grasping of challenging objects. Abstract-- Soft robotic hands promise to provide compliant and safe interaction with objects and environments. However, designing soft hands to be both compliant and functional across diverse use cases remains challenging. Although co-design of hardware and control better couples morphology to behavior [1], the resulting search space is high-dimensional, and even simulation-based evaluation is computationally expensive. In this paper, we propose a Cross-Entropy Method with Reward Model (CEM-RM) framework that efficiently optimizes tendon-driven soft robotic hands based on teleoperation control policy, reducing design evaluations by more than half compared to pure optimization while learning a distribution of optimized hand designs from pre-collected teleoperation data. We derive a design space for a soft robotic hand composed of flexural soft fingers and implement parallelized training in simulation.

machine learning, reinforcement learning, reward model, (14 more...)

arXiv.org Artificial Intelligence

2510.17086

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Checklist 1. For all authors (a)

Neural Information Processing SystemsOct-2-2025, 11:58:37 GMT

If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] See appendix B.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Simple Decentralized Cross-Entropy Method

Neural Information Processing SystemsAug-19-2025, 17:01:42 GMT

In this paper, we show that such a centralized approach makes CEM vulnerable to local optima, thus impairing its sample efficiency.

artificial intelligence, machine learning, policy network, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Alberta (0.14)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Boosting MCTS with Free Energy Minimization

Dao, Mawaba Pascal, Peter, Adrian M.

arXiv.org Artificial IntelligenceJan-22-2025

Active Inference, grounded in the Free Energy Principle, provides a powerful lens for understanding how agents balance exploration and goal-directed behavior in uncertain environments. Here, we propose a new planning framework, that integrates Monte Carlo Tree Search (MCTS) with active inference objectives to systematically reduce epistemic uncertainty while pursuing extrinsic rewards. Our key insight is that MCTS already renowned for its search efficiency can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain. Concretely, the Cross-Entropy Method (CEM) is used to optimize action proposals at the root node, while tree expansions leverage reward modeling alongside intrinsic exploration bonuses. This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability. Empirically, we benchmark our planner on a diverse set of continuous control tasks, where it demonstrates performance gains over both standalone CEM and MCTS with random rollouts.

artificial intelligence, machine learning, planning & scheduling, (12 more...)

arXiv.org Artificial Intelligence

2501.13083

Genre: Research Report (0.82)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement Learning

Huang, Kevin, Lale, Sahin, Rosolia, Ugo, Shi, Yuanyuan, Anandkumar, Anima

arXiv.org Machine LearningDec-14-2021

Current state-of-the-art model-based reinforcement learning algorithms use trajectory sampling methods, such as the Cross-Entropy Method (CEM), for planning in continuous control settings. These zeroth-order optimizers require sampling a large number of trajectory rollouts to select an optimal action, which scales poorly for large prediction horizons or high dimensional action spaces. First-order methods that use the gradients of the rewards with respect to the actions as an update can mitigate this issue, but suffer from local optima due to the non-convex optimization landscape. To overcome these issues and achieve the best of both worlds, we propose a novel planner, Cross-Entropy Method with Gradient Descent (CEM-GD), that combines first-order methods with CEM. At the beginning of execution, CEM-GD uses CEM to sample a significant amount of trajectory rollouts to explore the optimization landscape and avoid poor local minima. It then uses the top trajectories as initialization for gradient descent and applies gradient updates to each of these trajectories to find the optimal action sequence. At each subsequent time step, however, CEM-GD samples much fewer trajectories from CEM before applying gradient updates. We show that as the dimensionality of the planning problem increases, CEM-GD maintains desirable performance with a constant small number of samples by using the gradient information, while avoiding local optima using initially well-sampled trajectories. Furthermore, CEM-GD achieves better performance than CEM on a variety of continuous control benchmarks in MuJoCo with 100x fewer samples per time step, resulting in around 25% less computation time and 10% less memory usage. The implementation of CEM-GD is available at $\href{https://github.com/KevinHuang8/CEM-GD}{\text{https://github.com/KevinHuang8/CEM-GD}}$.

action sequence, cem, sequence, (17 more...)

arXiv.org Machine Learning

2112.07746

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Planning with Learned Dynamic Model for Unsupervised Point Cloud Registration

Jiang, Haobo, Xie, Jin, Qian, Jianjun, Yang, Jian

arXiv.org Artificial IntelligenceAug-19-2021

Point cloud registration is a fundamental problem in 3D computer vision. In this paper, we cast point cloud registration into a planning problem in reinforcement learning, which can seek the transformation between the source and target point clouds through trial and error. By modeling the point cloud registration process as a Markov decision process (MDP), we develop a latent dynamic model of point clouds, consisting of a transformation network and evaluation network. The transformation network aims to predict the new transformed feature of the point cloud after performing a rigid transformation (i.e., action) on it while the evaluation network aims to predict the alignment precision between the transformed source point cloud and target point cloud as the reward signal. Once the dynamic model of the point cloud is trained, we employ the cross-entropy method (CEM) to iteratively update the planning policy by maximizing the rewards in the point cloud registration process. Thus, the optimal policy, i.e., the transformation between the source and target point clouds, can be obtained via gradually narrowing the search space of the transformation. Experimental results on ModelNet40 and 7Scene benchmark datasets demonstrate that our method can yield good registration performance in an unsupervised manner.

deep learning, transformation, upstream oil & gas, (23 more...)

arXiv.org Artificial Intelligence

2108.02613

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback